Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhi Chen

Spring

$τ_0$-WM: A Unified Video-Action World Model for Robotic Manipulation

May 31, 2026

Pengfei Zhou, Shengcong Chen, Di Chen, Jiaxu Wang, Rongjun Jin, Bingwen Zhu, Yike Pan, Songen Gu, Kuanning Wang, Shufeng Nan(+10 more)

Abstract:Robotic manipulation requires models that generate executable actions while anticipating and evaluating their future consequences before physical execution. We present $τ_0$-World Model ($τ_0$-WM), a unified video-action world model that integrates policy learning, video prediction, and action evaluation within a single future-predictive framework. Built on a shared video diffusion backbone, $τ_0$-WM provides two complementary interfaces. First, a video action model jointly predicts future visual latents and continuous action chunks from multi-view observations, language instructions, and robot state. Second, an action-conditioned video simulator rolls out candidate action chunks into multi-view futures and predicts dense task-progress scores. The model is trained on approximately $27{,}300$ hours of real-robot teleoperation, UMI-style interaction, egocentric human videos, and rollout or failure trajectories using modality-specific supervision masks. At inference time, $τ_0$-WM uses test-time computation to sample action candidates, rank them with re-denoising consistency, and invoke simulator-based rectification for low-quality candidates. On challenging long-horizon and fine-grained robotic manipulation tasks, $τ_0$-WM shows superior performance over other relevant baselines.

* Our project homepge: https://finch.agibot.com/research/tau0-wm

Via

Access Paper or Ask Questions

CCLab: Adversarial Testing of Learning- and Non-Learning-Based Congestion Controllers

May 21, 2026

Zhi Chen, Shehab Sarar Ahmed, Chenkai Wang, Brighten Godfrey, Gang Wang

Abstract:Congestion controllers (CCs) are critical to network performance, and yet their robustness under adverse conditions remains insufficiently understood. While recent learning-based CCs have demonstrated strong performance in controlled environments, it is unclear how they compare to traditional CCs when controllers' input signals are corrupted or when environmental conditions become systematically challenging. In this paper, we introduce CCLab, an adversarial testing framework for systematically evaluating the robustness of both learning-based and non-learning-based CCs. CCLab includes a reinforcement learning (RL)-based adversarial agent that operates in a closed loop with the congestion control policy, generating bounded perturbations either on input signals (feature-level) or on external network conditions (environment-level), while preserving realism through explicit constraints. Using this framework, we compare learning-based CCs with non-learning-based CCs under both feature-level and environment-level adversarial conditions. While both types of CCs suffer from performance degradation under adversarial testing, we find that learning-based CCs, in general, are more robust than traditional human-designed algorithms. Finally, we show that our adversarial traces can be used to train more robust CCs that outperform existing learning-based CCs under both challenging and normal conditions.

* 13 pages for main paper, 16 pages in total

Via

Access Paper or Ask Questions

Same Signal, Different Semantics: A Cross-Framework Behavioral Analysis of Software Engineering Agents

May 18, 2026

Wei Ma, Zhi Chen, Jingxu Gu, Tianling Li, Shangqing Liu, Lingxiao Jiang

Abstract:Behavioral studies of LLM-based software engineering agents extract operational rules about which trajectory shapes correlate with higher resolution rates: that a test step follows a code modification, that error cascades are short, or that trajectories are compact. Each rule is typically derived from a single framework, and whether it transfers, in sign as well as magnitude, to structurally different agent designs has not been directly tested. We address this at ecosystem scale: 64,380 SWE-bench runs from 126 agent configurations spanning 43 frameworks, where each configuration pairs an LLM with a framework (e.g., SWE-Agent, OpenHands) that supplies its tools and workflow. We separate framework effects from LLM effects by holding each layer fixed in turn, then measure one behavior-outcome effect per configuration and examine how those effects agree or disagree. Swapping the framework while the LLM is held fixed produces large behavioral differences in every action feature. On most signals, configurations disagree not merely in magnitude but in direction. Error rate is the cleanest case: 47 configurations resolve more issues when their error rate is lower, while 48 resolve more when it is higher. Five other continuous features and three of seven binary patterns from prior SE literature show similar directional disagreement. Framework identity accounts for more of this variation than LLM family: for mean turns, framework explains 64% of the between-configuration variance against the LLM's 10%. The implication is that the same observable behavioral signal can carry opposite meaning for different agent configurations. Behavioral findings from any single framework therefore warrant cross-configuration validation before being claimed as general.

Via

Access Paper or Ask Questions

Joint Communication and Trajectory Design for Movable Antenna Systems

May 05, 2026

Jiaxuan Li, Weidong Mei, Changhao Liu, Zhi Chen, Boyu Ning, Rui Zhang

Abstract:Movable antennas (MAs) have attracted significant attention in wireless communications due to their ability to reconfigure channel conditions by flexibly adjusting the antenna positions within a confined region. However, MA movement generally incurs a non-negligible delay, which may significantly limit the data transmission time at optimized positions. To tackle this challenge, this paper investigates a new joint communication and trajectory optimization problem, where each MA transmits while moving along an optimized trajectory to prolong the effective data transmission time. Focusing on a single-MA system, our goal is to maximize the average data rate by optimizing the MA's positions over time, subject to its maximum velocity constraints. However, this continuous-time antenna position optimization problem is highly non-convex and challenging to solve. To tackle this challenge, we first consider a special case with two channel paths and derive the optimal MA trajectory in closed form. For other general cases, we ingeniously reformulate the average rate maximization problem into a fixed-hop shortest path problem in graph theory by sampling the antenna movement region into a multitude of discrete points, and solve it optimally. Simulation results demonstrate that our proposed algorithm can significantly improve the data rate compared to other baseline schemes.

Via

Access Paper or Ask Questions

Prior-Agnostic Robust Forecast Aggregation

Apr 27, 2026

Zhi Chen, Cheng Peng, Wei Tang

Abstract:Robust forecast aggregation combines the predictions of multiple information sources to perform well in the worst case across all possible information structures. Previous work largely focuses on settings with a known binary state space, where the state is either 0 or 1. We study prior-agnostic robust forecast aggregation in which the aggregator observes only experts' reports, yet is ignorant of both the underlying joint information structure and the full prior, including the underlying state space. Unlike the standard model that fixes the binary state space {0, 1}, we allow the (binary) unknown state values to be arbitrary numbers in [0, 1], so the same reported probability may correspond to very different realized outcome frequencies across environments. Our main contribution is a simple, explicit, closed-form log-odds aggregator that linearly pools forecasts in logit space, together with (nearly-)tight minimax-regret guarantees across three knowledge regimes. We first show that under conditionally independent (CI) signals, robust aggregation with an unknown state space is strictly harder than in the known-state setting by establishing a larger lower bound, and our aggregation rule can achieve a worst-case regret of 0.0255. Along the way, we also characterize tight regret bounds for Blackwell-ordered structures and for general information structures. In the classical setting with known state space {0,1}, our aggregator achieves regret strictly below 0.0226 for CI structures. To the best of our knowledge, this is the first explicit closed-form aggregator that achieves a regret upper bound strictly less than 0.0226. Finally, we extend the model where the aggregator additionally knows each expert's marginal forecast distribution; in this setting, with the CI structures, we show that a generalized log-odds rule achieves regret of 0.0228, complementing with a lower bound of 0.0225.

Via

Access Paper or Ask Questions

MedFlowSeg: Flow Matching for Medical Image Segmentation with Frequency-Aware Attention

Apr 21, 2026

Zhi Chen, Runze Hu, Le Zhang

Abstract:Flow matching has recently emerged as a principled framework for learning continuous-time transport maps, enabling efficient deterministic generation without relying on stochastic diffusion processes. While generative modeling has shown promise for medical image segmentation, particularly in capturing uncertainty and complex anatomical variability, existing approaches are predominantly built upon diffusion models, which incur substantial computational overhead due to iterative sampling and are often constrained by UNet-based parameterizations. In this work, we introduce MedFlowSeg, a conditional flow matching framework that formulates medical image segmentation as learning a time-dependent vector field that transports a simple prior distribution to the target segmentation distribution. This formulation enables one-step deterministic inference while preserving the expressiveness of generative modeling. We further develop a dual-conditioning mechanism to incorporate structured priors into the learned flow. Specifically, we propose a Dual-Branch Spatial Attention module that injects multi-scale structural information into the flow field, and a Frequency-Aware Attention module that models cross-domain interactions between spatial and spectral representations via discrepancy-aware fusion and time-dependent modulation. Together, these components provide an effective parameterization of conditional flows that capture both global anatomical structure and fine-grained boundary details. We provide extensive empirical validation across multiple medical imaging modalities, demonstrating that MedFlowSeg achieves state-of-the-art performance while significantly reducing computational cost compared to diffusion-based methods. Our results highlight the potential of flow matching as a theoretically grounded and computationally efficient alternative for generative medical image segmentation.

Via

Access Paper or Ask Questions

Executing as You Generate: Hiding Execution Latency in LLM Code Generation

Apr 01, 2026

Zhensu Sun, Zhihao Lin, Zhi Chen, Chengran Yang, Mingyi Zhou, Li Li, David Lo

Abstract:Current LLM-based coding agents follow a serial execution paradigm: the model first generates the complete code, then invokes an interpreter to execute it. This sequential workflow leaves the executor idle during generation and the generator idle during execution, resulting in unnecessary end-to-end latency. We observe that, unlike human developers, LLMs produce code tokens sequentially without revision, making it possible to execute code as it is being generated. We formalize this parallel execution paradigm, modeling it as a three-stage pipeline of generation, detection, and execution, and derive closed-form latency bounds that characterize its speedup potential and operating regimes. We then present Eager, a concrete implementation featuring AST-based chunking, dynamic batching with gated execution, and early error interruption. We evaluate Eager across four benchmarks, seven LLMs, and three execution environments. Results show that Eager reduces the non-overlapped execution latency by up to 99.9% and the end-to-end latency by up to 55% across seven LLMs and four benchmarks.

* 10 pages

Via

Access Paper or Ask Questions

Terahertz Beam Squint Mitigation via Six-Dimensional Movable Antennas

Mar 25, 2026

Yike Xie, Weidong Mei, Dong Wang, Yingqi Wen, Zhi Chen, Jun Fang, Wei Guo, Boyu Ning

Abstract:Analog beamforming holds great potential for future terahertz (THz) communications due to its ability to generate high-gain directional beams with low-cost phase shifters. However, conventional analog beamforming may suffer substantial performance degradation in wideband systems due to the beam squint effect. Instead of relying on high-cost true-time delayers, we propose an efficient six-dimensional movable antenna (6DMA) architecture to mitigate the beam-squint effect. In particular, we study a wideband wide-beam coverage problem in this paper, aiming to maximize the minimum beamforming gain over a given range of azimuth/elevation angles and frequencies by jointly optimizing the analog beamforming vector, the MA positions within a two-dimensional (2D) region, and the three-dimensional (3D) rotation angles of the antenna array. However, this problem is non-convex and intractable to solve optimally due to the coupling of the spatial and frequency domains and that of the antenna weights, positions and rotation. To tackle this problem, we first derive an optimal solution to it in a special case with azimuth or elevation angle coverage only. It is shown that rotating a uniform linear array (ULA) is sufficient to achieve global optimality and eliminate beam-squint effects. While for other general cases, an alternating optimization (AO) algorithm is proposed to obtain a high-quality suboptimal solution, where the antennas' beamforming weights, positions, and rotation angles are alternately optimized by combining successive convex approximation (SCA), sequential update with Gibbs sampling (GS), and hybrid coarse- and fine-grained search. Simulation results demonstrate that our proposed scheme can significantly outperform conventional antenna arrays without antenna movement or rotation, thus offering a cost-effective solution for wideband transmission over THz bands.

Via

Access Paper or Ask Questions

UniSem: Generalizable Semantic 3D Reconstruction from Sparse Unposed Images

Mar 18, 2026

Guibiao Liao, Qian Ren, Kaimin Liao, Hua Wang, Zhi Chen, Luchao Wang, Yaohua Tang

Abstract:Semantic-aware 3D reconstruction from sparse, unposed images remains challenging for feed-forward 3D Gaussian Splatting (3DGS). Existing methods often predict an over-complete set of Gaussian primitives under sparse-view supervision, leading to unstable geometry and inferior depth quality. Meanwhile, they rely solely on 2D segmenter features for semantic lifting, which provides weak 3D-level and limited generalizable supervision, resulting in incomplete 3D semantics in novel scenes. To address these issues, we propose UniSem, a unified framework that jointly improves depth accuracy and semantic generalization via two key components. First, Error-aware Gaussian Dropout (EGD) performs error-guided capacity control by suppressing redundancy-prone Gaussians using rendering error cues, producing meaningful, geometrically stable Gaussian representations for improved depth estimation. Second, we introduce a Mix-training Curriculum (MTC) that progressively blends 2D segmenter-lifted semantics with the model's own emergent 3D semantic priors, implemented with object-level prototype alignment to enhance semantic coherence and completeness. Extensive experiments on ScanNet and Replica show that UniSem achieves superior performance in depth prediction and open-vocabulary 3D segmentation across varying numbers of input views. Notably, with 16-view inputs, UniSem reduces depth Rel by 15.2% and improves open-vocabulary segmentation mAcc by 3.7% over strong baselines.

Via

Access Paper or Ask Questions

MV2UV: Generating High-quality UV Texture Maps with Multiview Prompts

Mar 16, 2026

Zheng Zhang, Qinchuan Zhang, Yuteng Ye, Zhi Chen, Penglei Ji, Mengfei Li, Wenxiao Zhang, Yuan Liu

Abstract:Generating high-quality textures for 3D assets is a challenging task. Existing multiview texture generation methods suffer from the multiview inconsistency and missing textures on unseen parts, while UV inpainting texture methods do not generalize well due to insufficient UV data and cannot well utilize 2D image diffusion priors. In this paper, we propose a new method called MV2UV that combines 2D generative priors from multiview generation and the inpainting ability of UV refinement to get high-quality texture maps. Our key idea is to adopt a UV space generative model that simultaneously inpaints unseen parts of multiview images while resolving the inconsistency of multiview images. Experiments show that our method enables a better texture generation quality than existing methods, especially in unseen occluded and multiview-inconsistent parts.

Via

Access Paper or Ask Questions